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Abstract 

Background: Pancreatic cancer is a highly lethal cancer with limited diagnostic and therapeutic modalities. 

Methods: To begin to explore the genomic landscape of pancreatic cancer, we used massively parallel sequencing 
to catalog and compare transcribed regions and potential regulatory elements in two human cell lines derived 
from normal and cancerous pancreas. 

Results: By RNA-sequencing, we identified 2,146 differentially expressed genes in these cell lines that were 
enriched in cancer related pathways and biological processes that include cell adhesion, growth factor and 
receptor activity, signaling, transcription and differentiation. Our high throughput Chromatin immunoprecipitation 
(ChIP) sequence analysis furthermore identified over 100,000 regions enriched in epigenetic marks, showing either 
positive (H3K4me1, H3K4me3, RNA Pol II) or negative (H3K27me3) correlation with gene expression. Notably, an 
overall enrichment of RNA Pol II binding and depletion of H3K27me3 binding were seen in the cancer derived cell 
line as compared to the normal derived cell line. By selecting genes for further assessment based on this difference, 
we confirmed enhanced expression of aldehyde dehydrogenase 1A3 (ALDH1A3) in two larger sets of pancreatic 
cancer cell lines and in tumor tissues as compared to normal derived tissues. 

Conclusions: As aldehyde dehydrogenase (ALDH) activity is a key feature of cancer stem cells, our results indicate 
that a member of the ALDH superfamily, ALDH1A3, may be upregulated in pancreatic cancer, where it could mark 
pancreatic cancer stem cells. 
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Background 

Although pancreatic cancer is the tenth most commonly 
diagnosed cancer in the U.S., it is the fourth most com- 
mon cause of cancer mortality with close to 37,000 
deaths per year in the U.S. [1]. The aggressive nature of 
this cancer is further emphasized by the dismal five-year 
survival rate of less than 5% [2]. Pancreatic ductal 
adenocarcinoma (PDAC), the most common form of 
pancreatic cancer, is thought to arise through a multistep 
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process involving intermediate precursor lesions known 
as pancreatic intraepithelial neoplasias (PanlNs) [3]. 
Genome-wide sequencing approaches have revealed a 
complex set of somatic alterations in pancreatic tumors 
such as single base mutations, amplifications, deletions 
and complex rearrangements that drive cancerous growth 
through specific signaling pathways [4,5]. The genes most 
frequently altered in pancreatic cancer are KRAS, TP53, 
CDKN2A and SMALM, leading to an activation of growth 
promoting and cell survival pathways, and inactivation of 
tumor suppressors and apoptotic pathways [4] . 

In addition to an accumulation of somatic mutations 
that affect the DNA sequence directly, epigenetic events 
can lead to an increase in cell growth or survival and 
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thereby contribute to tumor development [6]. DNA 
methylation and post-translational modification of his- 
tone proteins are epigenetic mechanisms that regulate 
gene expression in normal and cancerous cells. DNA 
methylation at specific genes can increase or decrease in 
cancer and is often associated with an inactivation of 
tumor suppressor genes [7]. Post-translational modifica- 
tion of histones mediates appropriate gene expression by 
regulating access of the transcriptional machinery to the 
underlying DNA [8]. Changes in DNA methylation have 
been investigated in pancreatic cancer [9-11] but gen- 
ome wide maps of histone modification patterns have 
not been reported in this disease. 

To enhance our understanding of the pancreatic can- 
cer genome, we catalogued transcribed sequences 
and potential functional elements in cell lines derived 
from normal and neoplastic pancreatic tissues by next 
generation RNA-sequencing and chromatin immunopre- 
cipitation (ChIP) with massively parallel sequencing. Dif- 
ferentially expressed genes and pathways were identified, 
and gene expression was correlated to histone modifica- 
tion patterns that mark active and repressed chromatin. 

Methods 

Cell lines and cell culture 

Two human pancreatic cell lines were purchased from 
ATCC: hTERT-HPNE (derived from normal ductal 
pancreatic tissue, immortalized with the TERT gene; 
modal chromosome number = 46) [12,13] and PANC-1 
(derived from pancreatic ductal adenocarcinoma; modal 
chromosome number = 63) [14]. The hTERT-HPNE cell 
line was cultured in 75% DMEM (ATCC), 25% Medium 
M3 Base (Incell Corporation), supplemented with 5% 
fetal bovine serum (FBS), 10 ng/ml human recombinant 
EGF, 5.5 mM D-glucose (1 g/L) and 750 ng/ml puro- 
mycin. The PANC-1 cell line was cultured in DMEM 
(ATCC) supplemented with 10% FBS. Cells were grown 
to log phase and harvested for RNA and ChIP sequen- 
cing experiments at approximately 60-80% confluency. 
The cell lines were harvested at passage 18 (PANC-1) 
and 23 (hTERT-HPNE) after purchase from ATCC. 

RNA preparation 

Total RNA was isolated from ~5xl0 cells using the 
mirVana™ miRNA isolation kit (Ambion) according to 
the manufacturer's protocol (retaining the small RNA 
fraction in the pool of total RNA). DNAse treatment 
was performed with TURBO DNase (Ambion); RNA 
quantity was assessed by Nano Drop (Thermo Scientific) 
and RNA quality by using the Agilent 2100 Bioanalyzer 
(Agilent Technologies). RNA quality (RIN) scores for 
RNA prepared from cell lines were > 9.0. 



ChIP and template preparation 

Chromatin immunoprecipitation (ChIP) was performed 
with the ChlP-IT™ Express kit (Active Motif) according 
to the manufacturer's protocol. Briefly, cells were grown 
to log phase and cross -linked with 1% formaldehyde for 
10 minutes at room temperature. The fixation reaction 
was stopped by a glycine stop solution and cells were 
washed in PBS and homogenized in lysis buffer 
supplemented with protease inhibitors. Shearing was 
performed by sonication in a Diagenode Bioruptor 
(30-sec on/off pulses for 15-20 min at high setting) to 
obtain 200-500 bp DNA fragments. After centrifugation, 
the supernatant was used for chromatin immunoprecipi- 
tation (from ~lxl0 cells each) using antibodies for: 
anti-mono-methylated histone H3 at lysine 4 (H3K4 
mel) (Abeam, ab8895), anti-tri-methylated histone H3 
at lysine 4 (H3K4me3) (Abeam, abl2209), anti-tri-meth- 
ylated histone H3 at lysine 27 (H3K27me3) (Abeam, 
ab6002) and anti-RNA polymerase II (Pol II) (Abeam, 
ab817). Input samples consisted of 10 uL sonicated 
chromatin. For each ChIP, 20 ug sheared chromatin was 
incubated overnight at 4°C on an end-to-end rotator, 
with 8ug antibody, protein G magnetic beads and prote- 
ase inhibitors. Finally, the ChIP and input DNA samples 
were reverse cross-linked, treated with proteinase K 
and purified with the QIAquick PCR Purification Kit 
(Qiagen). 

Library preparation and sequencing 

For mRNA-seq sample preparation, the mRNA-seq 
Sample Prep Kit (Illumina) was used according to the 
manufacturer's protocol. Briefly, 1 ug total RNA was 
used for polyA mRNA selection using Sera-mag oligo 
(dT) beads, followed by thermal fragmentation at 94°C. 
First strand cDNA synthesis was performed using Super- 
script II reverse transcriptase and random primers. 
Second strand synthesis was performed using DNA 
Polymerase I followed by end repair with T4 DNA poly- 
merase, Klenow DNA polymerase and T4 polynucleotide 
kinase (PNK). Finally, the double-stranded cDNA was 
ligated to Illumina paired-end (PE) adaptors and size se- 
lection performed for fragments in the 350±25 bp range. 
Libraries were amplified using Phusion DNA polymer- 
ase, followed by purification with the QIAquick PCR 
Purification Kit (Qiagen). Amplified libraries were di- 
luted with Elution Buffer to a final concentration of 10 
nM and run at a concentration of 4-6 pM on two Gen- 
ome Analyzer (GAII) lanes. 

MicroRNA-seq sample preparation was performed 
with the Small RNA vl.5 sample prep Kit (Illumina) 
according to the manufacturer's protocol. Briefly, 3' and 
5' small RNA adapters were ligated to 10 ug total RNA 
followed by reverse transcription with Superscript II and 
amplification using Phusion DNA polymerase. This was 
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followed by size selection of cDNA in the 100-108 bp 
range and measurement using a Bioanalyzer. Amplified 
miRNA-seq libraries were diluted with Elution Buffer to 
a final concentration of 10 nM and run at a concentra- 
tion of 4-6 pM on one Genome Analyzer (GAIIx) lanes. 

For ChlP-seq sample preparation, pooled ChIP reac- 
tions (10 ng) were used to prepare a single library using 
the ChlP-seq Sample Prep Kit (Illumina) according to 
the manufacturer's protocol. Briefly, DNA ends were 
repaired with T4 DNA polymerase, Klenow polymerase 
and T4 PNK. The Klenow fragment (3 ' to 5 ' exo minus) 
was then used to generate a protruding 3' A base and 
adapter oligos were ligated using DNA ligase. ChIP 
DNA in the 200±25 bp range was recovered and ampli- 
fied using Phusion polymerase. Library concentration 
was assessed using a Bioanalyzer. Amplified libraries 
were diluted with Elution Buffer to a final concentration 
of 10 nM and run at a concentration of 4-6 pM on 2-3 
Genome Analyzer (GAII or GAIIx) lanes. All next- 
generation sequencing was carried out at the National 
Cancer Institute's Center for Cancer Research Sequen- 
cing Facility operated by the Advanced Technology 
Program of SAIC-Frederick, Inc. (Gaithersburg, MD). 

Sequence alignment and analysis 

mRNA-sequencing: paired-end RNA-seq was repeated 
twice for each cell line to obtain reliable numbers of se- 
quence reads (Additional file 1: Table S1A). The median 
number of reads (35 bp) generated per paired-end ex- 
periment was 8.9 M (8.6 to 11.2 M). Total reads were 
combined for each cell line: 19,778,468 for the hTERT- 
HPNE and 17,769,249 for PANC-1 cells. No significant 
difference was observed in the number of reads gener- 
ated between the two cell lines (Student's £-test, P=0.53). 
Paired-end reads were mapped to the RefSeq database 
(National Center for Biotechnology Information (NCBI) 
build 37) using the Burrows-Wheeler Aligner (BWA) 
software with default parameters that allow up to 3 
alignments for each read and up to 2 mismatches for the 
seed sequence (the first 25 bp of each read) [15]. Reads 
that failed to map to RefSeq were mapped to the 
Ensembl database, which includes additional transcripts 
and pseudogenes [16]. Remaining unmapped reads were 
mapped to the human genome assembly (NCBI build 
37). The analysis pipeline is shown in Additional file 2: 
Figure SI. Gene expression was calculated in reads per 
kilobase of exon per million mapped sequence reads 
(RPKM) [17]. After filtering, we included 11,249 genes 
expressed above 1 RPKM in at least one cell line for fur- 
ther analysis. The edgeR package was used to identify 
genes that were differentially expressed in the two cell 
lines using total tag count values for each gene [18]. 
Data was normalized using the weighted trimmed mean 
of log-ratio values (using hTERT-HPNE as a reference) 



to account for library size; statistical analysis was 
performed using the Fisher's exact test. We considered 
genes to be differentially expressed between cell lines 
when the Benjamini-Hochberg false discovery rate 
(FDR) was less than 0.05 and the fold change difference 
in expression was equal to or greater than 3. 

miRNA-sequencing: After trimming adaptor sequences 
and filtering low-quality reads, a total of 9,262,094 and 
18,006,865 single-end short (22 bp) sequence reads were 
obtained from PANC-1 and hTERT-HPNE cells, respect- 
ively (Additional file 1: Table SIB). The miRNAkey soft- 
ware [19] was used to remove adaptor sequence from 
the 3 -ends of reads, to map raw reads to the miRBase 
mature database (release 16) [20], calculate normalized 
RPKM expression values and quantify differentially 
expressed miRNAs. Two hundred and fifty eight miRNA 
genes were detected at 1 RPKM or higher in at least one 
of the cell lines. Genes were considered differentially 
expressed with a fold change difference > 3 and a 
Bonferroni corrected Chi-square P-value < 0.05. 

Pathway enrichment analysis: Gene set enrichment ana- 
lysis of differentially expressed mRNA and miRNA genes 
was performed based on KEGG (Kyoto Encyclopedia of 
Genes and Genomes) [21] and GO (Gene Ontology) 
[22] annotations using GOseq, as it accounts for the 
bias of over-detection of differential expression for long 
and highly expressed transcripts [23]. The Wallenius ap- 
proximation method with Benjamini-Hochberg corrected 
.P-values was used to determine enriched pathways; and 
an FDR < 0.05 was considered significant. 

ChIP -sequencing: ChlP-seq single-end experiments 
were run on 2-3 lanes to obtain an adequate number of 
reads (Additional file 1: Table SIC). Matching input 
DNA was sequenced to obtain a reference for each cell 
line. The total number of short (35 bp) reads generated 
per lane varied from 4,509,035 to 27,577,446 with a me- 
dian of 19,279,525. Raw reads were combined for each 
histone modification mark and RNA Pol II as well as for 
input DNA for each of the cell lines. No significant dif- 
ference was observed in the number of reads generated 
in the two cell lines for the different histone modifica- 
tion marks or for RNA Pol II (Student's i-test, P-value 
range: 0.17-0.83). Alignment to the human genome as- 
sembly (NCBI build 37) was performed using Bowtie, 
allowing for up to 2 mismatches and only one (best) 
alignment [24]. The SICER software [25] was used to 
identify qualified peaks (islands) of histone and Pol II 
binding by comparing sequence reads from immunopre- 
cipitated and input DNA. A consolidating window size 
of 200 bps was used, a gap size of 200 bps (H3K4mel, 
H3K4me3 and Pol II) or 600 bps (H3K27me3), effective 
genome size of 81%, ratio of enrichment between experi- 
mental data (PANC-1) and control (hTERT-HPNE) >3 
and an FDR <0.05. Differentially enriched epigenetic 
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marks across the two cell lines were identified by the 
SICER-df function. A cis-regulatory element annotation 
system (CEAS) was used to attain summary statistics on 
ChIP enrichment peaks based on location in promoters, 
gene bodies or nongenic regions using the RefSeq data- 
base [26]. 

Integration of ChlP-seq and RNA-seq 

Genes were divided into quartiles based on digital expres- 
sion levels (RPKM values) for each cell line. Global profil- 
ing curves were generated for genes in each of the quartile 
groups by plotting the read distributions (tags were binned 
into 25 bps bins and trimmed based on Poisson distribu- 
tion) of different histone modification marks and RNA Pol 
II within 5,000 bp up- and downstream of transcription 
start sites (TSS) for RefSeq genes. 

Genomic copy number alterations 

To assess if a bias existed in read mapping for ChlP-Seq 
because of chromosomal copy number gains or losses, 
we extracted DNA (Gentra Puregene Tissue Kit, Qiagen) 
from the hTERT-HPNE and PANC-1 cell lines, and ge- 
notyped at NCI's Cancer Genomics Research Laboratory 
using the Illumina Human Hap lM-Duo chip. The 
Illumina GenomeStudio software was used to compute 
affinity-normalized probe intensities (normalized geno- 
type probe intensities, log R ratio (LRR) and B allele fre- 
quency (BAF)), quality scores and to call genotypes. 
Renormalized (quantile) LRR and BAF values were then 
analyzed using a custom software pipeline, R-Gada [27], 
to detect whole-chromosome and segmental events as 
previously described [28]. 

We only observed chromosomal copy number alter- 
ations in the PANC-1 cell line but not in the hTERT- 
HPNE cells. Adjacent events were merged if they had an 
identical state and distance between segments was <1 
Mb. Each event was then classified as copy number gain, 
loss or copy neutral loss of heterozygosity (CNLOH). 
Since we observed only 200 (0.49%) regions enriched for 
RNA Pol II binding and 41 (0.57%) regions enriched for 
H3K27me3 binding in regions that were lost in PANC-1 
cells, these were excluded from further analysis. To esti- 
mate a possible bias in mapping, we then calculated the 
number of regions enriched for RNA Pol II and 
H3K27me3 binding in the PANC-1 cells per 1 Mb for 
each autosomal chromosome for gain, CNLOH and copy 
neutral events. For interclass, unpaired comparisons, we 
used the Mann-Whitney U-test to assess differences in 
the distribution of genomic regions enriched for RNA 
Pol II and H3K27me3 binding in the PANC-1 cells. 

To assess if genes selected for replication by TaqMan 
expression analysis (see below) were located in regions 
of copy number gain or loss, we also genotyped the 
following cell lines and assessed chromosomal abnor- 



malities as described above: AsPC-1, BxPC-3, Capan-1, 
CFPAC-1, Hs 766T, SU.86.86 and SW 1990. 

Assessment of differential mRNA expression analysis 
using Real-Time qPCR 

An assessment of differential expression levels was 
attempted for 4 genes using custom mRNA Taqman ex- 
pression assays (Applied Biosystems): ALDH1A3 (HsOOl 
67476_ml), ITGBL1 (Hs00191224_ml), NFE2L3 (HsOO 
852569_gl) and SEMA4B (Hs00384240_ml). Fresh fro- 
zen pancreatic tissue samples (10 normal and 10 tumor 
derived) were obtained from the Mayo Clinic in Roches- 
ter MN (approved by the Institutional Review Board of 
the Mayo Clinic). All tumors were from patients diag- 
nosed as adenocarcinoma of the pancreas, with tumor 
percentages >70%; all normal derived samples were adja- 
cent to tumors (unmatched to tumors), confirmed by a 
pathologist to be normal by pathology and with >80% 
epithelial component. Eleven additional pancreatic can- 
cer cell lines were purchased from ATCC and cultured 
according to their guidelines: AsPC-1, BxPC-3, Capan-1, 
Capan-2, CFPAC-1, Hs 766T, HPAF-II, MIA PaCa-2, 
Pane 10.05, SU.86.86 and SW 1990. An additional 39 
pancreatic cancer cell lines (see Additional file 3: Table S6 
for names of cell lines and references) were kindly pro- 
vided by Dr. Udo Rudloff, Surgery Branch, NCI, NIH, and 
maintained in DMEM with 10% FBS. Frozen tissue sections 
(10 urn cut in at -20°C in a Cryostat) and cell lines (log 
phase) were harvested for RNA isolation as described 
above. Total RNA was reverse transcribed using Super- 
script III first-strand synthesis kit (Invitrogen). RT-qPCR 
was performed in triplicates using Taqman gene expres- 
sion Master Mix (Applied Biosystems) and a 7900HT se- 
quence detecting system (Applied Biosystems). The Cyo 
method was used to obtain the best fit estimators of reac- 
tion parameters for real-time PCR data [29]. The AACt 
method was used to calculate expression values by nor- 
malizing to the geometric mean of two housekeeping 
genes {B2M #Hs00187842_ml and PPIA #Hs9999990 
4_ml; Applied Biosystems). 

Results 

Using a genome-wide approach, we have assessed global 
gene expression profiles and regulatory elements in two 
cell lines derived from normal pancreatic ducts (hTERT- 
HPNE) and pancreatic ductal adenocarcinoma (PANC- 
1) by massively parallel sequencing-by-synthesis. 

Transcriptome analysis of pancreatic cell lines by mRNA- 
seq and miRNA-seq 

The total number of paired-end mRNA-sequence reads 
generated for each cell line is listed in Additional file 1: 
Table S1A. We constructed our mRNA-seq analysis 
pipeline to align reads sequentially to the RefSeq, 
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Ensembl and Human Genome (NCBI build 37) data- 
bases using BWA [15] (Additional file 2: Figure SI). By 
comparing the cancer derived PANC-1 cells to the nor- 
mal pancreatic tissue derived hTERT-HPNE cells, we 
identified 1,983 differentially expressed mRNA genes at 
a threshold of 3-fold difference or greater (Additional 
file 4: Table S2). Of these genes, 971 were expressed at 
higher levels and 1,012 at lower levels in the cancer de- 
rived cells compared to the normal derived pancreatic 
cells. Examples of mRNA genes that were expressed at 
higher levels in the PANC-1 cell line include growth fac- 
tors, receptors, and signaling molecules such as SHH, 
PDGFB, SMAD3, AKT2 and multiple WNT genes 
{WNT6, WNT7B, WNT9A and WNT10A). Genes 
expressed at lower levels in the PANC-1 cells include 
CDKN1A, CDKN2B, HHIP, PDGFRA, PDGFRB and 
MMP1. In addition, many genes encoding extracellular 
matrix (ECM) proteins and their receptors (integrins) 
were differentially expressed in the two cell lines. 

The total number of single-end miRNA sequence 
reads is listed in Additional file 1: Table SIB. We aligned 
reads to miRNA genes in the miRBase mature database, 
using miRNAKey [19]. Of the 258 miRNA genes 
expressed at 1 RPKM or higher, 128 were expressed at 
higher levels and 35 at lower levels in the cancer derived 
PANC-1 as compared to normal derived hTERT-HPNE 
cell line (Additional file 5: Table S3). Differential miRNA 
expression levels included higher levels of MIR767, 
MIR135B, MIR1269, MIR182, MIR183, and MIR203 and 
lower levels of MIR494, MIR424, MIR381, MIR452, and 
MIR15S in PANC-1 cells. 

We used KEGG [21] and GO [22] enrichment analysis 
to categorize differentially expressed mRNA and miRNA 



genes into biologically relevant pathways and processes. 
The fifteen significantly enriched KEGG pathway cat- 
egories included: pathways in cancer, focal adhesion, 
ECM-receptor interaction, Wnt signaling and cell adhe- 
sion molecules (Table 1). An enrichment of genes related 
to cardiomyopathy was also noted, but the list of genes 
overlapped considerably with two other pathways: focal 
adhesion and ECM-receptor interaction. Similarly, we 
detected significant enrichment of 56 GO biological 
process categories including: signal transduction, cell dif- 
ferentiation and cell adhesion (Additional file 6: Table 
S4). We also detected 25 GO molecular function cat- 
egories including: signal transducer activity, receptor 
activity, transcription factor activity, calcium ion bind- 
ing, actin binding and growth factor activity (Additional 
file 7: Table S5). 



Epigenome analysis of normal and cancer derived 
pancreatic cell lines by ChlP-seq 

To evaluate genome-wide distribution of epigenetic 
marks that associate with active or repressed expression, 
we performed ChlP-seq in the PANC-1 and hTERT- 
HPNE cell lines using antibodies specific for three his- 
tone modification marks: H3K4mel, H3K4me3 and 
H3K27me3, and for RNA Pol II. Reads (Additional file 1: 
Table SIC) were aligned to the human genome (NCBI 
build 37) using Bowtie [24]. Binding sites for histone 
modification marks and RNA Pol II were identified by 
comparing immunoprecipitated chromatin with input 
DNA using SICER [25]. The total number of all binding 
sites for H3K4mel, H3K4me3, H3K27me3, and RNA 
Pol II identified in each cell line at a FDR of <0.001 was 



Table 1 KEGG pathway enrichment analysis for mRNA and miRNA genes differentially expressed in tumor and normal 
derived pancreatic cell lines 



KEGG pathway ID 


KEGG pathway name 


P-value 


FDR 


# of genes 


05200 


Pathways in cancer 


5.40 x 10™ 


1.15 x 10™ 


69 


04510 


Focal adhesion 


1.05 x 10~" 


1.12 x 10" 09 


61 


04512 


ECM-receptor interaction 


1.11 x 10" 14 


2.37 x 10" 12 


36 


04060 


Cytokine-cytokine receptor interaction 


3.43 x 10" 07 


1.22 X 10" 05 


36 


04310 


Wnt signaling pathway 


6.26 x 10™ 


8.75 x 10™ 


36 


04360 


Axon guidance 


3.82 x 10™ 


9.05 x 10™ 


33 


04514 


Cell adhesion molecules (CAMs) 


2.27 x 10" 08 


9.67 x 10™ 


29 


04020 


Calcium signaling pathway 


1.47 x 10" 05 


2.61 x 10™ 


29 


04080 


Neuroactive ligand-receptor interaction 


1.78 X 10™ 


947 x 10™ 


26 


05412 


Arrhythmogenic right ventricular cardiomyopathy (ARVC) 


1.44 X 10™ 


9.47 x 10™ 


24 


05414 


Dilated cardiomyopathy 


3.82 x 10™ 


9.05 x 10™ 


22 


05410 


Hypertrophic cardiomyopathy (HCM) 


2.64 x 10" 06 


8.02 x 10" 05 


21 


04640 


Hematopoietic cell lineage 


1.47 X 10" 05 


2.61 x 10™ 


16 


05217 


Basal cell carcinoma 


6.58 x 10™ 


8.75 X 10™ 


15 


04610 


Complement and coagulation cascades 


1.01 x 10™ 


1.27 x 10™ 


10 



KEGG pathway ID numbers are listed and statistical significance is shown by P values and FDR values. 
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101,797 for the PANC-1 cell line and 115,371 for the 
hTERT-HPNE cell line (Additional file 1: Table SIC). 

The most prominent binding sites in promoters (up to 3 
kb upstream of known transcription start sites) were seen 
for H3K4me3 (26.6% of all sites in the hTERT-HPNE cell 
line and 27.3% in the PANC-1 cell line), in accord with its 
role in activation of gene expression. A similar pattern of 
binding sites was seen for H3K4me3 and RNA Pol II in 
the two cell lines, particularly in exons (range 2.7-3.1%), 
untranslated regions (range 2.8-4.0%) and introns (41.2- 
41.4% for RNA Pol II and 49.2-51.9% for H3K4me3). 
Similar overall binding patterns were noted for H3K4mel 
and H3K27me3 in both cell lines with regard to promoters 
(4.1-7.6%), exons (0.9-1.8%) and untranslated regions 
(0.7-1.4%). A greater number of intronic binding sites 
were seen for H3K4mel (54.3-56.7%) as compared to 
H3K27me3 (36.8-38.1%), and fewer sites were noted in 
distal intergenic regions for H3K4mel (28.9-37.6%) as 
compared to H3K27me3 (50.5-51.5%) (Additional file 8: 
Figure S2). 



Integration of ChlP-seq and mRNA-seq 

To correlate enrichment for histone modification marks 
to gene expression, we divided genes into four equal 
subsets based on digital gene expression levels (in 
RPKM) from highest to lowest. We then summarized se- 
quence reads from ChlP-seq relative to transcription 
start sites (TSS) (+/- 5,000 bps) for all RefSeq genes 
(Figure 1 for the PANC-1 cell line and Additional file 9: 
Figure S3 for the hTERT-HPNE cell line) to assess their 
patterns in promoter elements of genes expressed at dif- 
ferent levels. Read density for H3K4mel and H3K4me3 
was increased surrounding the TSS of highly expressed 
genes in both cell lines, especially for the latter histone 
mark. The distribution of reads is bimodal, with an in- 
creased number of reads upstream and downstream of 
the TSS and a valley in-between. The peaks are located 
at approximately 600-800 bps upstream and 1,200-1,400 
bps downstream of the TSS for H3K4mel and at ap- 
proximately 300-400 bps up- and downstream of the 
TSS for H3K4me3 with the latter being more prominent. 




t 1 r 



Relative Position of TSS (bp) 



H3K4me3 
Pol II 



B 




D 



S 8 



H3K4mel 
H3K27me3 



Relative Position of TSS (bp) 



Figure 1 Distribution of histone modification marks and RNA Polymerase II binding sites according to gene expression levels for 
RefSeq genes. Genes were divided into four categories according to gene expression values in RPKM from high (A), medium (B), low (C) to very 
low (D). Density traces for histone modification sequence tags were then graphed according to the start of transcription (TSS) for each of the 
four groups. Density traces are labeled in green for H3K4me1, red for H3K4me3, blue for H3K27me3 and purple for RNA Pol II (see panel insert). 
Results are shown for the PANC-1 cell line; similar results were seen for the hTERT-HPNE cell line (Additional file 9: Figure S3). 
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Figure 2 Distribution of the histone modification marks and RNA Polymerase II binding sites relative to transcriptional start sites (TSS) 
for RefSeq genes. Genes were divided into four categories based on digital gene expression (RPKM values) from high to very low. Sequence 
reads were then graphed relative to the TSS for each group. Density traces are labeled in blue for genes expressed at the highest levels, red for 
medium levels, green for low levels and purple for very low levels (see panel insert). Results are shown for the PANC-1 cell line (panels (A) for 
H3K27me3, (C) for H3K4me3, (E) for H3K4me1 and (G) for RNA Pol II) and for the hTERT-HPNE cell line (panels (B) for H3K27me3, (D) for 
H3K4me3, (F) for H3K4mel, and (H) for RNA Pol II). Note the "dip" in H3K27me3 sequence tags around the TSS of highly expressed genes in 
PANC-1 cells. 
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RNA Pol II binding sites were prominent over highly 
expressed genes with a peak of greatest density right 
around the TSS. Genes expressed at lower levels gradually 
lost binding of histones modified at lysine 4 and RNA Pol 
II (Figure 1, Figure 2, Additional file 9: Figure S3). In 
contrast, these genes are characterized by a relatively high 
density of reads for H3K27me3 with the greatest en- 
richment at approximately 200-300 bps downstream of 
the TSS (Figure 2A and B). Notably, when comparing 
epigenetic marks across the two cell lines, the greatest 
fraction of peaks enriched >3 fold in the PANC-1 cells 
was seen for RNA Pol II (71.8% increased, 28.2% de- 
creased) and the greatest number of peaks decreased >3 
fold in the PANC-1 cells was seen for the negative 
H3K27me3 histone modification mark (81.0% decreased, 
19.0% increased), implying that the cancer cells may 
be better poised for active transcription (Table 2A). 
This appears to happen on a genome wide level and 
can be seen in Figure 2 where H3K27me3 sequence 
tags "dip" around the TSS of highly expressed genes in the 
PANC-1 cell line (Figure 2A) but not the hTERT-HPNE 
line (Figure 2B). 

To verify that this differential enrichment in RNA Pol 
II and H3K27me3 binding sites was not caused by a bias 
in read mapping due to differences in DNA copy num- 
ber, we genotyped the cell lines using the Illumina 
Human Hap 1M chip. We noted chromosomal alter- 
ations in the PANC-1 cells but not in the hTERT-HPNE 
cells. To assess mapping bias, we compared binding sites 



for RNA Pol II and H3K27me3 in genomic regions in 
the PANC-1 cells showing either gain or copy neutral 
loss of heterozygosity (CNLOH) relative to copy neutral 
genomic regions. We did not observe statistically signifi- 
cant differences in the distribution of regions enriched 
for RNA Pol II binding in regions showing copy number 
gain (£=0.52) or CNLOH (P=0.11) as compared to copy 
neutral genomic regions. Similarly, we did not observe 
differences in the distribution of regions enriched for 
H3K27me3 binding in regions showing gain (P=0.61) or 
CNLOH (P=0.89) as compared to copy neutral genomic 
regions, suggesting that the observed differences in epi- 
genetic marks in the two cell lines are not due to a dif- 
ference in copy number alterations. 

Overexpression of ALDH1A3 in pancreatic cancer cell 
lines and tumors 

We used the transcriptome and epigenome data de- 
scribed above to select genes for assessment of expres- 
sion differences in independent sets of normal and 
tumor derived pancreatic tissue samples and cell lines. 
We based our selection of genes on the strong differ- 
ences in RNA Pol II and H3K27me3 binding sites in the 
two cell lines by applying the following criteria: 1.) 
located within 5 kb of a region enriched > 3 fold for 
RNA Pol II binding in the PANC-1 cells, 2.) located 
within 5 kb of a region enriched > 3 fold for H3K27me3 
binding in the hTERT-HPNE cells, and 3.) showed > 3 
fold increased expression in the PANC-1 as compared to 



Table 2 Characteristics of epigenetic marks that were increased or decreased over 3 fold in the PANC-1 as compared 
to the hTERT-HPNE cell line 



A. Epigenetic marks that were enriched or depleted in PANC-1 as compared to hTERT-HPNE pancreatic cells 






Increased in PANC-1 cells 


Decreased in PANC-1 cells 


Mark 


Fold change 


# Increased 


% Increased 


# Decreased 


% Decreased 


RNA Pol II 


> 3 


42,809 


71.8% 


1 6,841 


28.2% 


H3K4Me3 


> 3 


4,434 


53.7% 


3,828 


46.3% 


H3K4me1 


> 3 


1 3,999 


26.1% 


39,556 


73.9% 


H3K27Me3 


> 3 


2,474 


1 9.0% 


10,521 


81.0% 


B. Differential epigenetic marks (change > 3 fold) in the ALDH1A3 gene in PANC-1 and hTERT-HPNE pancreatic cell lines 








RNA Pol II 




H3K27me3 


Gene symbol 






ALDH1A3 




ALDH1A3 


Chromosome 






chr15 




chr15 


Start (bp) 






101,420,000 




101,435,600 


End (bp) 






101,420,199 




101,439,799 


PANC-1 readcount 






1.119 




1.181 


hTERT-HPNE readcount 






0.065 




5.404 


Fold change 






12.39 




0.23 


P value 






1.08 x 10" 13 




8.55 x 1 0' 24 


FDR 






3.16 x 10" 13 




1 .27 x 1 0' 22 



A. Number and percentages of epigenetic marks that were increased or decreased over 3 fold in the PANC-1 as compared to the hTERT-HPNE cell lines. 

B. Epigenetic marks in the ALDH1A3 gene are listed with location (NCBI build 37) of the respective epigenetic mark, normalized read counts, fold change and 
significance values [P value and False Discovery Rate, FDR). 



Jia ef al. BMC Medical Genomics 201 3, 6:33 
http://www.biomedcentral.eom/1755-8794/6/33 



Page 9 of 1 3 



hTERT-HPNE cells. This resulted in a total of 60 genes 
(labeled in bold in Additional file 4: Table S2). We then 
selected 4 genes from this list based on varying levels of ex- 
pression differences and location in genomic regions that 
either showed copy-number gain (ALDH1A3, SEMA4B) or 
did not (ITGBL1, NFE2L3) in PANC-1 cells. We first 
assessed gene expression by RT-qPCR in 10 normal and 10 
tumor derived pancreatic tissue samples as well as in 11 
additional pancreatic cancer cell lines. By selecting differ- 
entially expressed genes that were also enriched in positive 
marks and depleted in negative marks we confirmed differ- 
ential expression of one of the four genes, ALDH1A3, in 
this set of tumor and normal derived pancreatic tissues 
and cell lines (Figure 3, panels A and B). This gene was 
expressed on average 1.93 fold higher in the tumor derived 
samples as compared to the normal derived pancreatic tis- 
sue samples {PMann-whitney u test =0.034), and 3 of the 10 
tumor samples had over 3 fold higher expression than the 
average of the normal samples (range 3.32-4.10). The 
ALDH1A3 gene was expressed on average 1,308 fold 
higher in the tumor cell lines as compared to the normal 
derived hTERT-HPNE cells, and 23.5 fold higher in the 
tumor cell lines as compared to the average of the normal 
derived tissue samples. The difference between ALDH1A3 
expression levels in the tumor derived cell lines as com- 
pared to the tissue samples may be due to the higher de- 
gree of cellular heterogeneity in the latter set. Interestingly, 
by analyzing copy number alterations, we noted that its ex- 
pression is increased in cell lines showing copy number 
gain of the genomic region it resides in (PANC-1, SU.86.86 
and BxPC-3) as well as in cell lines without gain (Hs 766T, 
AsPC-1, Capan-1, SW 1990 and CFPAC-1) as shown in 
Figure 3B. We confirmed differential expression for the 
three other genes in the hTERT-HPNE and PANC-1 cell 
lines by RT-qPCR, but did not observe significandy differ- 
ential expression in the larger set of cell lines and tissue 
samples. A second validation in an independent set of 39 
additional pancreatic cancer cell lines (Additional file 3: 
Table S6) showed that ALDH1A3 was expressed on average 
1,630 fold higher in the tumor cell lines as compared to 
hTERT-HPNE cells, and 78.5 fold higher in the tumor cell 
lines as compared to the average of the normal derived tis- 
sue samples (Figure 3C). Increased RNA Pol II binding 
was seen in the promoter region of ALDH1A3 (12.39 fold), 
and decreased H3K27me3 binding (0.23 fold) in a region 
overlapping and surrounding exons 7 and 8, in the PANC- 
1 as compared to the hTERT-HPNE cells (Table 2B). In 
addition, binding of the negative mark H3K27me3 featured 
prominently over the first half of the gene in hTERT- 
HPNE cells (Additional file 10: Figure S4). 

Discussion 

Generating global datasets of transcribed sequences and 
epigenetic marks can form a foundation for genomic 



and functional investigations of the molecular mecha- 
nisms underlying specific cancers. Such datasets have 
been generated for many tissue types as part of the 
ENCODE project [30]. To begin to establish genomic 
datasets for the pancreas we analyzed expressed se- 
quences and signatures of functional elements in normal 
and pancreatic cancer derived cell lines by massively 
parallel sequencing. We used a diploid cell line derived 
from normal pancreatic ducts and immortalized with 
the TERT gene, hTERT-HPNE, and a commonly used 
hypertriploid pancreatic ductal carcinoma derived cell 
line, PANC-1, for our studies. In this process, we identi- 
fied differentially expressed genes that define pathways 
and cellular processes important for cancer, such as 
growth factor and receptor activity, signal transduction, 
focal adhesion, ECM-receptor interaction and cell differ- 
entiation. Enrichment of genes in these pathways reiter- 
ates the increased growth potential and diminished 
adhesive properties associated with the development and 
progression of pancreatic cancer. 

Many of the differentially expressed genes encode 
ligands, receptors and signaling molecules in the wing- 
less and sonic hedgehog signaling pathways, implicated 
in embryonic development and particular cancers, in- 
cluding that of the pancreas [31-33]. Genes important 
for focal adhesion and extracellular matrix receptor 
interaction were also prominent among differentially 
expressed genes featuring numerous integrin, collagen 
and laminin subtypes possibly indicating an underlying 
potential for altered proliferation and interaction with 
the microenvironment of the tumor derived cell line in 
an in vivo setting. MicroRNA expression has not been 
assessed before in pancreatic cancer by next generation 
sequencing methods. Our results in the two cell lines 
agree with previously published data on the upregulation 
of MIR93, MIR95, MIR135B, MIR181C, MIR181D, 
MIR182, MIR183, MIR 190, MIR196B and MIR203, and 
downregulation of MIR20A and MIR29C in pancreatic 
intraepithelial neoplasms (Panlns) or pancreatic adeno- 
carcinomas (PDACs) as compared to normal pancreatic 
tissue [34-36]. In agreement, we noted substantially 
higher (from 6 to over 200 fold) expression of the 
former group of genes and lower (3-18 fold) expression 
of the latter group in our dataset. MIR 13 SB, MIR182 
and MIR183 are overexpressed in a wide range of other 
cancer types such as bladder, colon, prostate cancer and 
glioma [37-41]. In addition, we noted differential expres- 
sion of miRNA genes not previously reported to be 
deregulated in cancer. Some of the largest differences 
were observed for MIR767 and MIR1269, both absent in 
the hTERT-HPNE cells but highly expressed in the can- 
cer derived PANC-1 cells. The function of MIR767 and 
MIR1269 is largely unknown although the latter may be 
important during differentiation of human embryonic 
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(See figure on previous page.) 

Figure 3 Expression of the ALDH1A3 gene in pancreatic cell lines and tissue samples. Expression levels for ALDH 1 A3 were determined by 
RT-qPCR in (A) normal derived pancreatic tissue samples (N-1 though N-10) and tumor derived pancreatic tissue samples (T1 through T10), as 
well as (B) in pancreatic normal (n=1) and tumor (n=12) derived cell lines. Expression levels were normalized to the geometric mean of B2M and 
PPIA and are shown in arbitrary units (A.U.). Chromosomal gain of the ALDH1A3 gene region is indicated below panel (B) as G: copy number gain, 
N: no gain. Copy number alterations were not assessed for cell lines marked with (C) ALDH1A3 expression levels were assessed in additional 
pancreatic cancer cell lines (n=39) by RT-qPCR. 



stem cells [42]. Our results confirm previous findings 
suggesting that altered miRNA expression contributes to 
cancerous growth in the pancreas and provide new can- 
didate miRNA genes that require validation in larger 
sample sets of normal and cancerous pancreatic tissues. 

Epigenomic assessment of histone modification pat- 
terns revealed an enrichment of mono- and tri- 
methylated H3K4 in promoters that was highest in 
actively transcribed genes and gradually lost with lower 
expression. The transcriptional start site of actively 
transcribed genes was occupied by RNA polymerase, 
surrounded by a layer of nucleosomes with tri- 
methylation at H3K4 and an outer core of nucleosomes 
with mono-methylation at H3K4. Trimethylation at 
H3K27 showed an inverse correlation with gene expres- 
sion. This pattern of modification marks is in accord- 
ance with the activating and silencing effects associated 
with these histone modifications in other cell types 
[43,44]. Notably, an overall depletion of H3K27me3 
marks and enrichment of RNA Pol II sites was seen in 
the cancer derived cell line, as compared to the normal 
derived cell line, indicating a possible mechanism of gen- 
eral alleviation of negative epigenetic regulation in pan- 
creatic cancer. By focusing on this apparent difference 
between epigenetic marks in the two cell lines when 
selecting genes for further examination, we confirmed 
higher levels of ALDH1A3 gene expression in an inde- 
pendent set of tumor derived pancreatic tissues and cell 
lines in comparison to normal derived pancreatic sam- 
ples. Of note is that of the genes selected for further as- 
sessment, all four showed very similar results in the 
initial set of cell lines using RT-qPCR, whereas only one 
showed a similar expression difference in the additional 
sample sets analyzed. This is probably due to the small 
initial sample set. Validation in a second larger set of 
pancreatic cancer cell lines confirmed higher expression 
levels of ALDH1A3 as compared to the normal derived 
cell line and normal derived tissue samples. A limitation 
of our study is the small number of samples, especially 
for normal derived pancreatic cell lines and for normal 
and tumor derived tissue samples. 

A general function for members of the aldehyde de- 
hydrogenase (ALDH) superfamily is to serve as detoxifi- 
cation enzymes by converting aldehydes to carboxyl 
acids (for review see [45]). The different ALDH 
subfamilies and their members have additional catalytic 



functions; the ALDH 1 A family is responsible for the 
conversion of retinaldehyde to retinoic acid (RA) and its 
three members (ALDH1A1 through ALD1A3) are thus 
important regulators of RA signaling. Increased aldehyde 
dehydrogenase activity is commonly seen in many can- 
cer types where it appears to specifically mark cancer 
stem cells [46,47]. These cells tend to be highly 
tumorigenic and invasive, and are often resistant to con- 
ventional therapy [48]. The presence of aldehyde de- 
hydrogenase positive cells in tumor from patients with 
pancreatic cancer has been associated with decreased 
survival, and ALDH1A1 expression (ALDH1A3 was not 
tested) correlated with invasion of pancreatic cancer cell 
lines [49]. Of the 19 ALDH isoforms, ALDH1A1 has 
most often been linked to the increased ALDH activity 
of cancer stem cells [50]. However, ALDH1A3 expres- 
sion has been shown to correlate better with aldehyde 
activity in breast cancer stem cells than ALDH1A1 ex- 
pression, and correlate with tumor grade, stage and me- 
tastasis [51]. Although we confirmed increased mRNA 
levels of ALDH1A3 in an independent set of pancreatic 
tissue samples and two series of cell lines, our finding 
should be validated in additional sample sets to thor- 
oughly investigate the relationship between ADH1A3, 
cancer stem cells and pancreatic cancer. 

Conclusion 

The high resolution and read depth of next generation 
sequencing has allowed us to accurately quantify gene 
expression and locate epigenetic marks, thus enabling a 
detailed analysis of the pancreatic cancer genome. Al- 
though limited by a small set of samples, this study has 
thoroughly catalogued the transcriptome and epigenome 
of two pancreatic cell lines, confirmed previous genes 
and pathways deregulated in pancreatic cancer and gen- 
erated a series of hypotheses worth following-up in fu- 
ture studies to investigate key events in pancreatic 
carcinogenesis. 

Additional files 



Additional file 1: Table SI. Number of reads generated by mRNA-seq 
(A), miRNA-Seq (B), and ChlP-seq (C) experiments and their alignment 
characteristics. The number of peaks from each ChlP-seq experiment is 
shown in (C). 
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Additional file 2: Figure SI. RNA-seq analysis pipeline for read 
alignment and digital gene expression quantification. 

Additional file 3: Table S6. Additional cell lines used for ALDHIA3 
expression assessment by RT-qPCR. Names of pancreatic cancer cell lines, 
references and additional information on how they were obtained. 

Additional file 4: Table S2. Differentially expressed mRNA genes in 
tumor and normal derived pancreatic cell lines. mRNA genes with a 3 
fold or greater expression difference (n=1,983) are listed, as are expres- 
sion values in RPKM and FDR values. Genes that are expressed >3 fold 
higher in the PANC-1 cells, located within 5 kb of regions enriched >3 
fold in RNA Pol II binding in the PANC-1 cells, and enriched >3 fold in 
H3K27me3 binding in the hTERT-HPNE cells are marked in bold text. 

Additional file 5: Table S3. Differentially expressed miRNA genes in 
tumor and normal derived pancreatic cell lines. miRNA genes with a 3 
fold or greater expression difference (n=163) are listed, as are expression 
values in RPKM and FDR values. 

Additional file 6: Table S4. GO analysis for enrichment of biological 
processes (GO BP) for mRNA and miRNA genes differentially expressed in 
tumor and normal derived pancreatic cell lines. Statistical significance is 
shown by P values and FDR values. 

Additional file 7: Table S5. GO analysis for enrichment of molecular 
function (GO MF) for mRNA and miRNA genes differentially expressed in 
tumor and normal derived pancreatic cell lines. Statistical significance is 
shown by P values and FDR values. 

Additional file 8: Figure S2. Distribution of histone modification peaks 
and RNA Pol II binding sites according to locations within promoters, 
regions downstream of genes, 5' and 3' untranslated regions (UTR), 
coding exons, introns and distal intergenic regions of the genome for 
the hTERT-HPNE (A) and PANC-1 (B) cells by analysis with the CEAS 
software. 

Additional file 9: Figure S3. Distribution of histone modification marks 
and RNA Polymerase II binding sites according to gene expression levels 
for RefSeq genes in the hTERT-HPNE cell line. Genes were divided into 
four categories according to gene expression values (RPKM) from high 
(A), medium (B), low (C) to very low (D). Density traces for histone 
modification sequence tags were then graphed according to the start of 
transcription (TSS) for each of the four groups. Density traces are labeled 
in green for H3K4mel, red for H3K4me3, blue for H3K27me3 and purple 
for Pol II. 

Additional file 10: Figure S4. Distribution of sequence reads for RNA 
Pol II and H3K27me3 in the ALDH1A3 gene in PANC-1 and hTERT-HPNE 
cells. Note increased RNA Pol II binding in the promoter of ALDH1 A3 in 
PANC-1 cells, and increased H3K27me3 binding in first half of the gene in 
the hTERT-HPNE cells. 
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